My project will be “Analyzing Salary Amounts of Business Professionals using Data Analytics.” For this project, I will use a dataset from Kaggle called “Salary of data professions.” Kaggle has pre-processed the data in this dataset to identify salary information about business professionals.
I chose this topic because after I graduate, I plan to use my data analytics minor in my career so I can combine my understanding of accounting with data analytics. The topic of gender pay gaps is relevant to my life because as a women going into the accounting field that is mostly male-dominated, I am interested if data analytics is also a male-dominated field.
I will use descriptive statistics to summarize the main features of the data. Next, I will use data visualization techniques to visualize patterns I see in the data. These graphs will include bar charts, scatter plots, and boxplots used to help identify trends. I am going to use regression analysis to identify relationships between variables. I am very interested in discovering how all variables connect and seeing direct and indirect relationships between variables.
In order to analyze differences between concentrations and salary amounts of different data professionals, we need to understand what each variable means and how it is relevant to the study.
Rows: 2,639
Columns: 13
$ FIRST.NAME <chr> "TOMASA", "ANNIE", "OLIVE", "CHERRY", "LEON", "VICTOR…
$ LAST.NAME <chr> "ARMEN", "", "ANCY", "AQUILAR", "ABOULAHOUD", "", "AG…
$ SEX <chr> "F", "F", "F", "F", "M", "F", "M", "M", "F", "F", "F"…
$ DOJ <chr> "5-18-2014", "", "7-28-2014", "04-03-2013", "11-20-20…
$ CURRENT.DATE <chr> "01-07-2016", "01-07-2016", "01-07-2016", "01-07-2016…
$ DESIGNATION <chr> "Analyst", "Associate", "Analyst", "Analyst", "Analys…
$ AGE <int> 21, NA, 21, 22, NA, 22, 22, NA, 28, 22, 24, 36, 24, 2…
$ SALARY <int> 44570, 89207, 40955, 45550, 43161, 48736, 40339, 4005…
$ UNIT <chr> "Finance", "Web", "Finance", "IT", "Operations", "Mar…
$ LEAVES.USED <int> 24, NA, 23, 22, 27, 20, 19, 29, 20, 15, 22, NA, 15, 2…
$ LEAVES.REMAINING <int> 6, 13, 7, 8, 3, 10, 11, 1, 10, 15, 8, 11, 15, 7, 0, N…
$ RATINGS <int> 2, NA, 3, 3, NA, 4, 5, 2, 3, 3, 4, 2, 5, 2, 4, 5, 3, …
$ PAST.EXP <int> 0, 7, 0, 0, 3, 0, 0, 2, 1, 0, 1, 9, 1, 0, 1, 0, 2, 1,…
According to this bar plot, the business unit that has the most data professionals is IT. The business segment with the fewest amount of data professionals is management.
My first research question is:What are the highest-paying areas of business in which to work as a data professional? To answer this question, I am going to create side-by-side box plots of senior analyst salaries in every business unit.
The side-by-side box plots clearly show that the highest-paying fields to work in as a senior data analyst are operations and IT.
In order to compare salary differences of men versus women based on business unit, we first need to filter the dataset into smaller datasets. It is important to filter the dataset so that every employee in the dataset is from the same business unit and designation, so their are no outliers. After we filter the dataset into smaller ones, we can create boxplots to compare the salaries of entry-level female data analysts and entry-level male data analysts.
The business units with the largest differences in entry-level salaries between men and women are marketing and operations. The graphs show that men make significantly more than women while working as data analysts in the marketing field. In contrast, the graphs also show that women make more while working in operations as a data analyst. While there are small salary differences between men and women in the other graphs, they are not significant enough to conclude that the differences in salaries is due to gender.
In order to understand if one gender is underrepresented in the data science field, it is important to see how many women there are versus men in high-ranking analyst positions and regular positions. It is important to know how many men there are compared to women in high-ranking postions and regular positions To compare the number of men vs. women in senior analyst positions and regualr analyst positions, I filtered the dataset into a smaller ones containing only senior analysts and regular analysts. These box plot show the frequency of men vs. women in senior analyst positions and regular positions.
The first bar graph shows that there are more men than women in senior analyst postions. However, the second graph shows that there are more men than women as regular analysts. It can be assumed from these two graphs that it is harder for women to land a senior analyst job, even if it is not as hard to be hired as a regular analyst. Even though there are more women than men in regualr positons, the fact that there are more men in senior postions puts women at a disadvantage because they are underrepresented in the leadership of their company.
In order to answer the research question if whether there are higher concentrations of men in data science professions, it is necessary to understand if men have an easier time getting hired into data positions. In this graph, I filtered the dataset into a smaller one that only contains entry-level analysts with zero experience working in the IT unit. This bar chart shows the frequency of men vs. women getting hired into analyst roles with no experience behind them
This bar chart clearly shows that men are more likely to be hired as a data analyst with no experience behind them. If men are more likely to be hired simply because they are men, women are at a disadvantage when looking for a job in the data science field.
There are various other different variables that contribute to salary differences between employees. The two other most relevant variables are age and years of experience. It is important to understand these variables’ effects on salary as well. These two scatterplots show the relationships of age and years of experience with salary amounts.
Both of these scatterplots clearly show strong, positive relationships between the variables. The older you are and more experience you have, the higher your salary will be as a data analyst.
After doing various types of analysis to this dataset, I have come to various conclusions about my research questions:
The most common field to work in as a data analyst is IT. This was shown in the first bar graph I made. The highest-paying business units are IT and operations. Senior Analysts that have many years of experience in data analysis typically earn the most in operation and IT, as seen by the side-by-side box plots shown earlier.
Overall, men with no data analysis experience are more likely to be hired over women in entry-level data analysis positions. Additionally, men are also more likely to be promoted to senior analyst positions. This was shown in the bar graphs I made depicting the number of senior analysts based on sex the number of analysts hired with zero years of experience.
According to the graphs I made depicting salary differences between men and women in different business units, the only business units in which there is a significant difference between the way women and men are paid are marketing and operations. In the marketing field, men are paid significantly more. Additionally, women are generally paid more than men in the operations field. This shows that the differences in the way men and women are paid depend on the business unit they are working in, because it can go both ways.
---
title: "Analyzing salary amounts"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: minty
#navbar-bg: "purple"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(DT) #needed with datatable()
library(tidyverse)
library(plotly) #needed with ggplotly()
salary<-read.csv("./salaryprediction.csv")
```
Introduction
===
Column {data-width=550}
---
#### Analyzing salary amounts of business professionals using data analytics
#### Introduction
My project will be “Analyzing Salary Amounts of Business Professionals using Data Analytics.” For this project, I will use a dataset from Kaggle called “Salary of data professions.” Kaggle has pre-processed the data in this dataset to identify salary information about business professionals.
Column {.tabset data-width=450}
---
### Research Questions
- research questions
- What are the highest-paying areas of business in which to work as a data professional?
- Do men have certain advantages in data science fields?
- Are women disproportionately paid in data science fields?
Background
===
#### Why I chose this topic
I chose this topic because after I graduate, I plan to use my data analytics minor in my career so I can combine my understanding of accounting with data analytics. The topic of gender pay gaps is relevant to my life because as a women going into the accounting field that is mostly male-dominated, I am interested if data analytics is also a male-dominated field.
#### Research Plan
I will use descriptive statistics to summarize the main features of the data. Next, I will use data visualization techniques to visualize patterns I see in the data. These graphs will include bar charts, scatter plots, and boxplots used to help identify trends. I am going to use regression analysis to identify relationships between variables. I am very interested in discovering how all variables connect and seeing direct and indirect relationships between variables.
Data
===
Column {data-width=550}
---
### <b><font size = 4><span Style = "color:blue">First 500 observations</span></font></b>
```{r show_table}
datatable(salary[1:500,],rownames=FALSE,colnames=c("first name","last name","sex","date of join","current date","designation","age","salary","unit","leaves used","leaves remaining","ratings","past experience"),options=list(pageLength=20))
```
Column {data-width=450}
---
### <font size = 4><span Style = "color:red">Variables</span></font>
In order to analyze differences between concentrations and salary amounts of different data professionals, we need to understand what each variable means and how it is relevant to the study.
- First Name: first name of employee
- Last Name: last name of employee
- Sex: sex of employee
- Date of Join: date the employee started working the job
- Current Date: current date
- Designation: rank of the employee
- Age: age of employee
- Salary: annual salary of the employee
- Unit: business unit of the employee
- Leaves Used: number of leaves used by the employee
- Leaves Remaining: number of leaves remaining for the customer
- Ratings: performance ratings of the employee
- Past Experience: past work experience before joining the current company
```{r}
glimpse(salary)
```
Business Unit Frequencies{data-orientation=rows}
===
### **Bar Chart**
```{r bar1}
par(mpg=c(4,1,0))
par(mar=c(5,7,4,2))
unit_bar_plot<-ggplot(salary,aes(x=UNIT))+
geom_bar()+
labs(title="salary of data professionals",x="business unit",y="number of professionals")
unit_bar_plot
```
Row{data-height=350}
---
According to this bar plot, the business unit that has the most data professionals is IT. The business segment with the fewest amount of data professionals is management.
Highest-paying business units {data-orientation=columns}
===
Column {data-width=500}
---
#### Highest-paying business units
My first research question is:What are the highest-paying areas of business in which to work as a data professional? To answer this question, I am going to create side-by-side box plots of senior analyst salaries in every business unit.
#### Analysis
The side-by-side box plots clearly show that the highest-paying fields to work in as a senior data analyst are operations and IT.
Column {data-width=500}
---
```{r}
analystsalary<-salary%>%filter(DESIGNATION=="Senior Analyst")
unit_salary<-ggplot(analystsalary,aes(x=UNIT,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of salary by unit",x="unit",y="salary")
unit_salary
```
Salary by gender in business units{data-orientation=columns}
===
Column {data-width=500}
---
#### Filtering
In order to compare salary differences of men versus women based on business unit, we first need to filter the dataset into smaller datasets. It is important to filter the dataset so that every employee in the dataset is from the same business unit and designation, so their are no outliers. After we filter the dataset into smaller ones, we can create boxplots to compare the salaries of entry-level female data analysts and entry-level male data analysts.
```{r}
finance<-salary%>%filter(UNIT=="Finance")%>%filter(DESIGNATION=="Analyst")
marketing<-salary%>%filter(UNIT=="Marketing")%>%filter(DESIGNATION=="Analyst")
management<-salary%>%filter(UNIT=="Management")%>%filter(DESIGNATION=="Analyst")
operations<-salary%>%filter(UNIT=="Operations")%>%filter(DESIGNATION=="Analyst")
web<-salary%>%filter(UNIT=="Web")%>%filter(DESIGNATION=="Analyst")
IT<-salary%>%filter(UNIT=="IT")%>%filter(DESIGNATION=="Analyst")
```
#### Analysis
The business units with the largest differences in entry-level salaries between men and women are marketing and operations. The graphs show that men make significantly more than women while working as data analysts in the marketing field.
In contrast, the graphs also show that women make more while working in operations as a data analyst. While there are small salary differences between men and women in the other graphs, they are not significant enough to conclude that the differences in salaries is due to gender.
Column {data-width=500}
---
```{r}
gender_finance<-ggplot(finance, aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+ labs(title = "Boxplot of finance salary by gender",x="gender",y="salary")
gender_finance
gender_marketing<-ggplot(marketing,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of marketing salary by gender",x="gender",y="salary")
gender_marketing
gender_IT<-ggplot(IT,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of IT salary by gender",x="gender",y="salary")
gender_IT
gender_management<-ggplot(management,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of management salary by gender",x="gender",y="salary")
gender_management
gender_operations<-ggplot(operations,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of operations salary by gender",x="gender",y="salary")
gender_operations
gender_web<-ggplot(web,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of web salary by gender",x="gender",y="salary")
gender_web
```
Designation by gender in business units{data-orientation=columns}
===
Column {data-width=500}
---
#### Filtering
In order to understand if one gender is underrepresented in the data science field, it is important to see how many women there are versus men in high-ranking analyst positions and regular positions. It is important to know how many men there are compared to women in high-ranking postions and regular positions To compare the number of men vs. women in senior analyst positions and regualr analyst positions, I filtered the dataset into a smaller ones containing only senior analysts and regular analysts. These box plot show the frequency of men vs. women in senior analyst positions and regular positions.
#### Analysis
The first bar graph shows that there are more men than women in senior analyst postions. However, the second graph shows that there are more men than women as regular analysts. It can be assumed from these two graphs that it is harder for women to land a senior analyst job, even if it is not as hard to be hired as a regular analyst. Even though there are more women than men in regualr positons, the fact that there are more men in senior postions puts women at a disadvantage because they are underrepresented in the leadership of their company.
Column {data-width=500}
---
```{r}
senioranalysts<-salary%>%filter(DESIGNATION=="Senior Analyst")
senior_bar_plot<-ggplot(senioranalysts,aes(x=SEX))+
geom_bar()+
labs(title="gender of senior analysts",x="gender",y="number of professionals")
senior_bar_plot
entryanalysts<-salary%>%filter(DESIGNATION=="Analyst")
entry_bar_plot<-ggplot(entryanalysts,aes(x=SEX))+
geom_bar()+
labs(title="gender of regular analysts",x="gender",y="number of professionals")
entry_bar_plot
```
Years of experience & salary {data-orientation=columns}
===
Column {data-width=450}
---
#### Analysts with zero experience
In order to answer the research question if whether there are higher concentrations of men in data science professions, it is necessary to understand if men have an easier time getting hired into data positions. In this graph, I filtered the dataset into a smaller one that only contains entry-level analysts with zero experience working in the IT unit. This bar chart shows the frequency of men vs. women getting hired into analyst roles with no experience behind them
#### Analysis
This bar chart clearly shows that men are more likely to be hired as a data analyst with no experience behind them. If men are more likely to be hired simply because they are men, women are at a disadvantage when looking for a job in the data science field.
Column {data-width=450}
---
```{r}
zeroexp<-salary%>%filter(PAST.EXP==0)%>%filter(DESIGNATION=="Analyst")%>%filter(UNIT=="IT")
zero_exp_bar_plot<-ggplot(zeroexp,aes(x=SEX))+
geom_bar()+
labs(title="gender of regular analysts with no experience",x="gender",y="number of professionals")
zero_exp_bar_plot
```
Other relevant graphs{data-orientation=columns}
===
Column {data-width=450}
---
#### Other relevant variables
There are various other different variables that contribute to salary differences between employees. The two other most relevant variables are age and years of experience. It is important to understand these variables' effects on salary as well. These two scatterplots show the relationships of age and years of experience with salary amounts.
#### Analysis
Both of these scatterplots clearly show strong, positive relationships between the variables. The older you are and more experience you have, the higher your salary will be as a data analyst.
Column {data-width=450}
---
```{r}
age_salary_scatterplot<-ggplot(salary,aes(x=AGE,y=SALARY))+geom_point(color="blue")+geom_smooth(color="red")+theme(text = element_text(size=20))
age_salary_scatterplot
experience_salary_scatterplot<-ggplot(salary,aes(x=PAST.EXP,y=SALARY))+geom_point(color="blue")+geom_smooth(color="red")+theme(text = element_text(size=20))
experience_salary_scatterplot
```
Conclusion
===
#### Conclusion
After doing various types of analysis to this dataset, I have come to various conclusions about my research questions:
#### 1. What are the most common and highest-paying areas of business in which to work as a data professional?
The most common field to work in as a data analyst is IT. This was shown in the first bar graph I made. The highest-paying business units are IT and operations. Senior Analysts that have many years of experience in data analysis typically earn the most in operation and IT, as seen by the side-by-side box plots shown earlier.
#### 2. Do men have certain unfair advantages in data science professions?
Overall, men with no data analysis experience are more likely to be hired over women in entry-level data analysis positions. Additionally, men are also more likely to be promoted to senior analyst positions. This was shown in the bar graphs I made depicting the number of senior analysts based on sex the number of analysts hired with zero years of experience.
#### 3. Are women disproportionately paid in data science fields?
According to the graphs I made depicting salary differences between men and women in different business units, the only business units in which there is a significant difference between the way women and men are paid are marketing and operations. In the marketing field, men are paid significantly more. Additionally, women are generally paid more than men in the operations field. This shows that the differences in the way men and women are paid depend on the business unit they are working in, because it can go both ways.